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Abstract 



Feature trees generalize first-order trees (which are called ground terms in the general 
framework of universal algebra). Namely, argument positions become keywords ("features") 
from an infinite symbol set T . A constructor symbol becomes a node symbol that can occur 
with arbitrary and arbitrarily many argument positions. Feature trees are used to model flexible 
records; the assumption that T is infinite accounts for dynamic record field additions. 

We develop a universal algebra framework for feature trees. We extend the classical set- 
defining notions: automata, regular expressions and equational systems, and show that they 
coincide. This extension of the regular theory of trees requires new notions and proofs. 
Roughly, a feature automaton reads a feature tree in two directions: along its branches and 
along the list of the direct descendants of each node. The second direction corresponds to an 
automaton on a commutative monoid (over an infinite alphabet). 

One motivation for this work stems from the fact that, in a type system for the programming 
language LIFE, the types denote sets of feature trees. Operations needed for type checking 
can now be implemented by the corresponding automata algorithms. 



Resume 

Des arbres a traits generalisent des arbres du premier ordre (qui sont appeles des termes 
clos dans l'algebre universelle). A savoir, les positions d' arguments deviennent des mots cles 
("traits") appartenants a un ensemble infini de symboles, T . Un symbole de constructeur devient 
un symbole de noeud qui peut apparaitre avec n'importe quelles positions d'arguments en 
n ' importe quel nombre. Des arbres a traits sont utilises pour la modelisation des enregistrements 
flexibles; la supposition de l'infinitude de T est necessaire pour rendre compte des additions 
dynamiques de champs d'enregistrements. 

Nous developpons un cadre formel pour les arbres a traits, dans l'algebre universelle. Nous 
etendons les notions classiques definissant les ensembles : les automates, les expressions 
regulieres et les systemes equationels, et nous montrons qu'elles coincident. Cette extension 
de la theorie reguliere des arbres necessite de nouvelles notions et preuves. Schematiquement, 
un automate a traits lit un arbre a traits dans deux directions : le long de ses branches, et le 
long de la liste des descendants directs de chaque noeud. Cette deuxieme direction correspond 
a un automate sur un monoid commutatif . 

Une motivation de ce travail vient du fait que, dans un systeme de types pour le langage 
de programmation LIFE, les types denotent des ensembles d' arbres a traits. Des operations 
utilisees pour la verification de types peuvent maintenant etre implementees par les algorithmes 
d' automates correspondants. 
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1 Introduction 

In this section, we will give some background and motivation ("the task") and then outline the 
rest of the paper ("the method"). 

The Task. We describe a specific formalism of data structures called feature trees. They 
are a generalization of first-order trees, also called constructor trees or the elements of the 
Herbrand universe. Since trees have been useful, e.g., for structuring data in modern symbolic 
programming languages like Prolog and ML, the more flexible feature trees have an interesting 
potential. Precisely, feature trees model record structures. They form the semantics of record 
calculi like [AK86], which are used in programming languages [AKP91b] and in computational 
linguistics (cf., the book [Car92]). In the logical framework for record structures of [AKPS92], 
they constitute the interpretation of a first-order theory, which is completely axiomatizable, 
and hence decidable [BS92]. 

As graphs, feature trees are easily described as finite trees whose nodes are labeled by node 
symbols (instead of constructor symbols), and whose edges are labeled by feature symbols 
(instead of being numbered), all those edges outgoing from the same node by different ones. 
Thus, symbolic keywords called features denote the possible argument positions of a node. 
They access uniquely the node's direct subtrees. All node symbols can label a node with any 
features attached to it, in any, though finite, number. 

Although thoroughly investigated [AK86, Smo92, BS92, AKPS92], also in comparison with 
first-order trees [ST92], feature trees have never been characterized as composable elements 
in an algebraic structure, i.e., with operations defined on them. Also, up to now, there has been 
no corresponding notion of automaton. This device has generally proven useful for systems 
calculating over sets of elements. 

The practical motivation for such a system comes from the possibility of defining a hierarchy 
of types denoting sets of feature trees. For its use in a logical programming system employing 
feature trees such as LIFE [AKP91b], we need to compute efficiently the intersection of two 
types (roughly, for unification). Concurrent systems, in connection with control mechanisms 
such as residuation or guards [AKP91a], require furthermore an efficient test of the subset 
relation (matching). Thus, we need to provide a formalism defining the types in a way that is 
expressive enough and yet keeps the two problems decidable. Such a formalism can be given, 
for example, as a system of equations and the corresponding automata notion with Boolean 
closure properties and decidable emptiness problem. 

Also, if we want to extend the techniques of type systems for logic programming, where types 
denote sets of trees (cf., the book [Pfe92]), to LIFE, where types will instead denote sets of 
feature trees, we first have to provide a corresponding formal framework. 

A major difficulty in the construction of a suitable algebraic framework for feature trees (i.e., 
with the property that automata and equational systems coincide 1 ) comes from the fact that the 
set T of features, i.e., of possible argument positions of a node accessing its direct subtrees, is 

'We note that the expressiveness of tree automata is equal to the one of equational systems for the free term 
algebras over finite signatures; it is strictly weaker in the case of infinite signatures for all tree species, including 
those considered in [Cou89, Cou92]. 
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infinite. The infiniteness of T is, however, an essential ingredient of the formal frameworks 
modeling structures. A practical motivation of the infiniteness is the need to account for the 
possibility of dynamic addition of (arbitrarily many) record fields to a value. It turns out 
that this semantical point of view has advantages for implementation as well. Namely, the 
correctness of the efficient algorithms for entailment and for solving negated constraints on 
feature trees [AKPS92] relies on the infiniteness of T . 

The Method. The first step in solving the problem described above is to build an appropriate 
algebraic framework. Such a framework is provided by universal algebra in the case of 
first-order trees. Formally, these are the elements of the free algebra over a given signature 
of function symbols (finite or infinite, cf, [Mah88]). This framework yields immediately a 
"good" notion of automata. 

In fact, as Courcelle has shown in [Cou89, Cou92], universal algebra provides a framework for 
a rich variety of trees. Clearly, that work inspired our notion of the algebra underlying feature 
trees. We introduce this notion in Section 2. Informally speaking, the operation composing 
feature trees in the algebra takes a record value and adds a record field containing another 
value to it. In a special case, this amounts to Nivat's notion of 'sum of trees' [Niv92]; thus, 
incidentally, we obtain an algebraic formalization hereof. 

To define feature automata as algebras, it is useful to consider the class of all finite trees 
whose nodes are labeled by node symbols, and whose edges are labeled by feature symbols. 
We call these multitrees. 2 Multitrees are of interest on their own, namely for representation 
of knowledge with set-valued attributes [Rou88]. Thus, feature trees are multitrees with the 
restriction that features are "functional," i.e., all edges outgoing from the same node are labeled 
with different features. Feature automata recognize languages of multitrees, which are then 
cut down to recognize languages of feature trees. 

In Section 3, we will define feature automata and show the basic properties of this notion: 
closure under the Boolean operations and decidability of the emptiness problem. In order to 
restrict our study to finitely representable automata and yet to account for the infiniteness of 
the set of features J 7 , we introduce the notion of a finitary automaton: the number of states 
is finite, and the evaluation of the automaton can be specified not only on single symbols, but 
also on finite sets or on complements of finite sets of symbols. Thus, it could be specified by 
saying either "the value off ... for all symbols/ £ F' or "the value off ... for all symbols 
/ F," where F C T is finite. 

Roughly, a feature automaton reads a feature tree in two directions: along its branches (from 
the frontier to the root) and along the fan-out of each node (along all argument positions). This 
is necessary in order to account for the flexibility in the depth as well as in the out-degree of 
the nodes of feature trees. The first direction is standard for all automata over trees. In order 
to study its behavior in the latter direction, or what we call the local structure of the recognized 
language, we consider recognizable sets of feature trees of depth 1, called flat feature trees. 



2 The unranked unordered trees studied in [Cou89] (the number of arguments of the nodes is not restricted, and 
the arguments are not ordered) are a special case of multitrees, namely with just one feature. In the framework 
of [Cou89], however, recognizability by automata is strictly weaker than definability by equational systems, even 
if the set of node labels is finite. 
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In Section 4, we define a class of logical formulas, called counting constraints. The name 
comes from the fact that they express threshold or modulo counting of the subtrees which are 
accessed via features from a finite or co-finite set of features. That is, their occurrences are 
counted up to a certain number, or modulo a certain number. 

The main technical result of this paper is a theorem saying that counting constraints characterize 
exactly the recognizable sets of flat feature trees. The proof takes up Sections A and B. The 
theorem essentially links counting and the fmitary-condition; in all of the set-defining devices 
presented here, either of these two notions accounts for the infmiteness of T . 

Counting constraints can express that certain features exist in the flat feature tree (labeling 
edges from the root), and that others do not. 3 As a consequence, one can show that the set of 
first-order trees, with fixed arity assigned to node symbols, and recognizable subsets of these, 
are sets recognized by feature automata. 

In Sections 5 and 6, we give two alternative ways to define recognizable sets of feature trees 
which are more practical than automata: regular expressions and equational systems. In the 
first one, the sets are constructed by union, substitution and star, i.e., iterated substitution 
(and, optionally, complement or intersection). In the second, they are defined as solutions of 
equations in a certain form. For both, counting constraints can be used to define the base cases. 
Thanks to the main theorem in Section 4, we are able to show that either class of defined sets 
is equal to the one for feature automata. Moreover, the devices can be effectively translated 
one into the other. These results, together with the previous ones, are necessary to present 
a complete regular theory of feature trees and to offer a solution to the practical problem of 
computing with types denoting sets of feature trees as described above. 

2 The Algebra J 

In this section, we will introduce feature trees and the more general multitrees as elements of 
an algebra that we define, called J. This yields the notion of a J"-automaton. This section 
follows the approach of [Cou89] and [Cou92]. 

In the following we will assume a given set S of node symbols 4 (referred to by A, B, etc.) and 
a given set T of feature symbols (also called attributes, or record field selectors, referred to by 
/, g, etc.). 

Formally, multitrees are trees (i.e., finite directed acyclic rooted graphs) whose nodes are 
labeled over S, and whose edges are labeled over T . Or, the set M.T of multitrees over S 
and T can be introduced as M.T = \J n>0 M.T n where (let IN denote the set of all natural 

numbers, and N^£- te the set of finite multisets with elements from the set M): 
MT 0 = {(A,0)|A£<S}, 

MT n = {{A,E)\AeS,Ee^J MTn - 1 } U MTn-L 

3 In [ST92, Smo92], these correspond to the constraints xF, xfl or their negations, where F C T finite and 
/ e T. 

4 In the literature on feature trees, the elements of S are usually called "sorts." In this text, we use "node 
symbols" instead of "sorts" in order to avoid confusion with the notion of sorts of domains in universal algebra. 
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MT n contains the multitrees of depth < n. 

Feature trees are multitrees such that all edges outgoing from the same node are labeled by 
different features. TT denote the set of all feature trees (and TT n all those of depth < ri). 

We introduce two sorts MT and F for multitrees and features, respectively, and define the 
{MT, F}-sorted signature: 

S = {^} W Fb)S 

where is a function symbol of profile: MT x F x MT i— > MT, and the symbols in T and S 
are constants of sort F and of sort MT, respectively. 

The algebra of multitrees J is defined as a I7-algebra. Its two domains are D M t = MT and 

Dp = T of the sorts MT and F, respectively. Its ternary function symbol =^ 5 is interpreted in 
J as the operation which composes two multitrees t, t' 6 MT via a feature/ 6 T to a new 
multitree composed of t and t' with an edge labeled/ from the root of t to the root of t'. Or 
(where U denotes multiset union), 

^ J ((A,E),f,t) = {A,EU {(f,t)}). 

Borrowing the 'tree sum' notation from [Niv92], we might write (t,f, t 1 ) more intuitively 
as t +ft'. In fact, for the special case where F = {1, 2} (the two features denoting left and 
right successors), we obtain an algebraic reading of the notation of [Niv92]. 

The interpretation of the constants is given by/- 7 =/andA J = (A, 0). 

It is easy to verify that the algebra J satisfies the order independence theory (OIT), i.e., the 
following equation is valid in J. 

=> (=> {x,f U Xi),f 2 ,X 2 ) = => (=> {x,f2,X 2 )juX 1 ) (1) 

In the 'tree sum' notation this expresses the commutativity 6 of +, in the sense that t+f\t\ +fih = 
t +fiti +fih. Of course, always t +f 1 t 1 +f 2 t 2 ^ t +fi(h +f 2 t 2 ). 

We use Ts to denote the free algebra of terms over the signature S. 

Lemma 1 The algebra of multitrees J is isomorphic to the quotient of the free term algebra 
over £ with the least congruence generated by the order-independence equation (1), 

3 - T s/OIT ■ 

We note the well-known fact that, given any system of equations £, T s j£ is the initial object 
in the category of all I7-algebras satisfying the equations £. 

A J"-automaton is a tuple [A, h, <2fi n ai) consisting of a finite I7-algebra A, a homomorphism 
h : J i-> A and the subset <2fi na i C D^ T of values of sort MT ("final states") where the number 
of values of sort MT and of sort F ("states") is finite. A J"-automaton corresponds to the "more 

5 Weuse the symbol =>■ in reminiscence of the notation for record descriptions like ^-terms in [AK86, AKP91b], 
which are of the form ip = X : s(fi ipi ,...,/« => ip n ). 

5 In a sense which can be made formal (cf., Section A), also the associativity holds for +; this justifies dropping 
the parenthesis. 
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concrete" notion of a (finite deterministic bottom-up) tree automaton over the terms of Ts 
such that all terms which are equal modulo OIT are evaluated to the same state. This means 
that any representation of a multitree t as a term in can be chosen in order to calculate the 
value of t. 



3 Feature Automata 

Given any many-sorted signature £ with a finite number of non-constant function symbols 
c G {£ — Sg) for every sort s, we define a I7-algebra A to be finitary if, for each sort s and 
each value q G Df of sort s, the set: 

{ce2°\c A = q} 

of constant symbols in £ of sort s which are valued to q is finite or co-finite. 

We now return to the particular {MT, F}-sorted signature £ introduced above; clearly, the 
definitions below can be made in the general framework as well. 7 

A feature automaton A is defined as a finitary J"-automaton. The set of multitrees recognized 
by A is the set: 

Lmt{A) = {te MT | h(t) G gfinal}, 

and the set of feature trees recognized by A is the set: Lft(A) = Lmt{A) fl TT . The 
families RecMr{j) and Rec^q-{j) of recognizable sets of multitrees and feature trees are 
defined accordingly. 

Remark. If (and only if) the set of features is infinite, the set of all feature trees is not a 
recognizable language of multitrees (with respect to J). 

Example. We will construct a feature automaton A that recognizes the set of natural numbers. 
These are coded into the feature trees of the form (0, {[succ, (0, {{succ, (..., {(0, 0)})})})}), 
with n edges labeled succ for the natural number n. As elements in the quotient term algebra 
•^i/OIT' tne y wou ld be written as the singleton congruence classes {=^ (0, succ, =^ (0, succ, =^ 
(..., 0)))}. The feature automaton A has the states Q = {q nat , q ot her} and P = {p succ , Pother} 
of sort MT and F, respectively. The evaluation is given by: 

0 A = q nat , 

A A = qother ifA^O, 

SUCC — Psucc j 

f A = Pother if/ 7^ succ > 

^ {jlnatiPsucaQnat) — tfnat j 

^ A (q\,p,qi) = qother otherwise. 

As final state set we choose <2fi na i = {q n at}- It is clear that A respects the order independence 
theory and the fmitary-condition. Of course, it will be more practical to define this set by 
regular expressions or equational systems. 



7 Also, the finitary-condition: finite or co-finite, could be made more general such that the proof of Theorem 1 
still holds. 
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The following theorem and corollary state that the standard properties of recognizable 
languages are valid for the sets in Rec^q- as well. 

Theorem 1 

1. The family of recognizable languages of feature trees Recj=r is closed under the Boolean 
operations. The corresponding feature automata can be given effectively. 

2. The emptiness problem ( L^q-[A) = 0 ) is decidable for each feature automaton A. 

Proof. The known constructions for Boolean operations on automata are still valid for 
J"-automata. To see that the nnitary-condition is preserved, simply note that the system of 
finite and co-finite sets is Boolean closed and, for two states q\ and qi of the feature automata 
A\ and Ai, respectively, 

{c G S° s | c(^) = ( quq2 ) } = {c G S 0 | cAy = qy } n {c G S 0 | C A 2 = ft } 

Since J = 7^/oiT' eacn ^-automaton A corresponds to a tree automaton At over terms in 
T s , and: 

Lr T (A) = ® iff L Ts {A T ) = $, 

it suffices to decide the emptiness problem for the tree automaton At- As usual, this can be 
done by checking all terms of depth smaller than the number of states of At- Let C be some 
finite set of constants c such that c A = q for each state q which is a value of some constant. 
If (and only if) L is not empty, it contains a term of bounded depth that is constructed with 
constants of C and non-constant function symbols. But there are only finitely many terms of 
this kind. 

A fmitary automaton can be finitely represented. From such a representation one can calculate 
some set C as described above. This yields an algorithm for testing Lmt{ A) = 0. In the case 
of Ljft {A) the algorithm checks only terms representing feature trees. □ 

We conclude the section by defining non-deterministic feature automata which are needed in 
Sections 5 and 6. 

Definition 1 A non-deterministic feature automaton A = (Q,P,h, <2finai) is a tuple such 
that: 

Q is the set of states of sort MT, P the set of states of sort F and <2fi na i C Q the set of final 
states, 

h is composed of the functions h : S — > 2^ and h : T — > 2 P and the transition function 
Q X P X Q -> 2 Q , 

A satisfies OIT, i.e., for all states q,p\,q\,p2, qi, 

=^ A [=^ A (q,p u q\),pi,qi) = ^ A {q,Pi, qi),Pu qi), 

and A satisfies the finitary-condition, i.e., for all states p and q, the sets 
{f G T | p G f A } and {A G S \ q G A A } are finite or co-finite. 
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The evaluation of the term t 6 Ts by A, i.e., the set h[t) C Q is defined inductively by: 
h{^{t u f,t 2 )) =^ A {h{h),h{f),h{t 2 )). 

If t\ and ti are congruent modulo OIT, we have h(ti) = h^tj). Thus, h{[t]) = h(t) is well 
defined for all congruence classes [t]. The language of multi trees recognized by A is: 

Lmt(A) = {W|HM)ne final ^0}, 

and the language of feature trees recognized by A is L^q-{A) = Lmt{A) fl Tl '. Each 
feature automaton is also a non-deterministic feature automaton. 

Lemma 2 Given a non-deterministic feature automaton A, an equivalent (deterministic) 
feature automaton A d can be constructed effectively. 

Proof We apply the usual subset construction on a given non-deterministic feature automaton 
A of the form above, yielding the equivalent automaton A d as follows: Q d = 2 e , P d = 
2 P , A A " =A A ,f Ad :=f A ,m± 

=> A " {<fuP d ,4) = LK=^ P. ft) I (91. P. ft) eqixp d x q(}. 
We define the final states of A d by: 2^ nal = | ^ n Qfinai t 0 }• 

Clearly, the algebra A rf satisfies the O/J-axiom. The equality: The nnitary-condition is 
preserved, since: 

{A I A Ad = q d } = P| {A I G A -4 } n f| {A | 9 6 A A } C 
shows that the nnitary-condition is preserved, too. □ 

4 Counting Constraints 

In this section we characterize recognizable languages of flat feature trees using formulae of a 
certain from, called counting constraints. The proof of this characterization, which is the main 
technical result of this paper, will be done in Sections A and B. 

The syntax of counting constraints C (written C(x) to indicate that x is the only free variable) 
is defined in the BNF style as follows (where F is a finite or co-finite sets of features, n, m 6 IN 
are natural numbers, and 5 is a finite or co-finite subset of S). 

C{x) ::= 



The counting constraint C(x) = card {ip 6 F \ By. (xipy A Py)} = nmodm holds for the 
multitree x if the number of all edges in x which: (1) go from the root to a node labeled by 



card {ip G F \ By. \x<py A Py)} 
Sx 

C(x) A C(x) 
->C(x) 



n modm 



(2) 
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a symbol in T and (2) are labeled by a feature ip in F, is equal to n modm. The cardinality 
operator card applies on a multiset of features, i.e., counts their double occurrences. 

The counting constraint C(x) = Sx holds for the multitree x if the root of x is labeled by 
some symbol in 5. 

We note the following fact (cf., [Eil74]). 

Fact 1 A language of natural numbers is recognizable iff it can be decomposed into a finite 
union of sets of the form: {n + k ■ m \ k £ N}, with n, m £ N. 

Thus, we can define the syntax of counting constraints equivalently in the form (where N is 
a set of natural numbers which is recognizable in the monoid (]N, +, 0); S, and T, a finite or 
co-finite subset of S; F a finite or co-finite sets of features): 

C(x) ::= card {<p £ F \ By. (x<py A Ty)} £ N 

j C(x) A C{x) ( 3 ) 
| C(x) V C(x) 

Note that this definition, too, yields immediately that counting constraints are closed under 
negation. Indeed, -i card {<p £ F \ By. (x<py A Ty)} £ N is equivalent to card {<p £ 
F | By. (xipy A Ty)} £ N c , and -i Tx is equivalent to T c x. 

Some important feature constraints can be expressed by our new constraints. For example, 
in the syntax of [Smo92], for F C T finite, for/ £ T , and for A £ S: xF ("for exactly the 
features / in F there exists one edge labeled / from the root"), xf [ ("there exists no edge 
labeled/ from the root"), and Ax ("the root is labeled by A"). 

xF = f\ card{ipe {f}\3y.xipy} e {1} 
feF 

A card{(p £ F° \ By. xtpy } £ {0} , 
xf | = card{cp £ {f} | By. xcpy} £ {0} , 
Ax EE {A}x . 

Each constraint C(x) defines the set Lmt{c) of multi trees x for which the constraint C(x) 
holds. Accordingly, we define: Lpt(c) = Lmt{c) H TF , LmtXc) = Lmt{c) PI M.T\, and 
L FTl (c) = Lpr(c) PI J^Ti. The languages of flat multi trees of the form Lmt x (c), or of flat 
feature trees L Fri (c), are called counting-definable. 

The following theorem holds for multitrees instead of feature trees, as well. 

Theorem 2 A language of flat feature trees is counting-definable iff it is recognizable (in J, 
by a feature automaton). 

8 We define n mod 0 = n, although this is not quite standard. That is, "counting" means here threshold- and 
modulo counting. 
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Proof Sketch. A flat multitree can be represented as a finite multiset over (j 7 U {root}) X S. 
The operation =^ corresponds to the union of such multisets. In Section A we study the 
algebra M. of finite multisets of pairs. It is three-sorted, the sorts denoting T U {root}, S and 
M.T, respectively. We show that J- and .M-recognizability coincide. 

In Section B, we consider counting constraints D(x) for multisets x of M. They are of the 
form: 

D(x) = card{(f,A) £ x\f £ F, A £ T} £ N , 

or conjunctions or disjunctions of these. Again F and T are finite or co-finite subsets of T and 
S and N is a recognizable set of natural numbers. 

We show that definability of languages of multisets by these constraints and .M-recognizability 
coincide. The main idea is that the mapping: 

x card{(f,A) £ x\f £ F, A £ T} 

is essentially a homomorphism from M. into N. □ 

The theorem above expresses that feature automata can count features either threshold or 
modulo a natural number. 



5 Kleene's Theorem 

We define regular expressions over feature trees. In generalization of the standard cases, the 
atomic constituents of these are not just constants (denoting singletons or trees of depth 1), but 
expressions which denote sets of feature trees of depth < 1 . 

As usual, we need construction variables labeling the nodes where the substitution and the 
Kleene star operations can take place. These variables are taken from a set Y which is assumed 
given (disjoint from S). It is infinite; the definition of each regular language, of course, uses 
only a finite number of construction variables. We call a syntactic expression C of the form (2) 
a counting-expression if T ranges over finite or co-finite subsets of S U Y. Its denotation is 
defined as the set of all feature trees of depth < 1 which satisfy it as a counting constraint over 
the extended alphabet of sorts. 

A regular expression R over T and S U Y is of the form given by: 

R ::= C C is a counting-expression 

| R y R concatenation (where y £ Y) 

j R* y Kleene star (where y £ Y) 

j R U R union 

Complement and intersection are optional operators, which, as we will see, do not properly 
add expressiveness. 

The definition of the language L^q-[R) of feature trees (or Lmt{R) of multi trees) denoted by 
the regular expression R is by straightforward induction. For concatenation and Kleene star for 
sets of multitrees: If L\ and L2 are sets of feature trees, then L\ - y L2 is obtained by replacing 
the construction variable y in the leaves of the trees of L\ by (possibly different) trees of Lj_ . 
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The Kleene star operation on a set is an iterated concatenation of a set with itself. Formally, 
for a set L of feature trees, L y = L, L" := L" -1 - y L, and L*> = \J n>l L". 

The languages of feature trees (or multi trees) denoted by regular expressions are called regular 
languages. 

Theorem 3 (Kleene) A language of feature trees (or multitrees) is regular iff it is recogniz- 
able. 

Proof. It is sufficient to prove the theorem for multitrees. We show by induction over the 
structure of the regular expressions that the language of each regular expression over S U Y 
and T is recognizable. The base case R = C is handled by Theorem 2. Union is captured by 
the Boolean closure properties in Theorem 1 . Substitution and star are established using the 
equivalence of deterministic and non deterministic feature automata. For the other direction, 
we use the standard McNaughton/Papert induction technique, the base case being handled 
again by Theorem 2. □ 

6 Equational Systems 

The next possibility to define recognizable sets of feature trees (or multitrees) in a convenient 
way uses equational systems. These systems again generalize the constituents from singletons 
of trees of the form a or/(vi , . . . , y„), for a £ So and/ 6 S n in the case of a ranked signature 
for first-order trees, to counting-expressions denoting (unions of) sets of flat feature trees. 

The extra symbols y £ Y in these counting expressions now correspond to set variables of the 
equations. 

We write C(y\ ,y n ) instead of C if the set variables of C are contained in the set {vi , . . . , y„}. 
These variables are not to be confused with the logical variable x used in C = C(x) as a 
logical formula. 

An equational system is a finite set £ of equations of the form (where C, is a counting- 
expression, for i = 1, . . . , n): 

yi = Q(yi,...,y„). 

Given an assignment, i.e., a mapping a : Y \-> 2 Tr , the equations in £ are interpreted such 
that C,(yi ,y n ) denotes the set: 

Lft(Q) - yi a(yi) y 2 ... - yn a(y n ). 

A solution of £ is an assignment a satisfying £. Each equational system has a least solution. 
The existence follows with the usual fixed point argument. Namely, an equational system is 
considered as an operator over the lattice of assignments a and the least solution is obtained 
in uj iteration steps of this operator, starting with the assignment a(y, ) = 0 for i = 1, . . . , n. 

A language of feature trees is called equational if it is the union of some of the sets a(y,) for 
the least solution a of £. The notion is defined accordingly for multitrees. 

We can now formulate the last characterization of recognizability: 
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Theorem 4 A language of feature trees (or multitrees) is equational iff it is recognizable. 

Proof Since jT-recognizability corresponds to the characterization by congruence relations, 
and Theorem 2 covers the case of feature trees of depth < 1 , the proof can be done following 
the standard one for first-order trees (cf, [GS84]). □ 

7 Conclusion and Further Work 

The results of this paper together present a complete regular theory of feature trees. They offer 
a solution to the concrete practical problem of computing with types denoting sets of feature 
trees as described in the introduction. 

Now, it is interesting to investigate where, in the wide range of applications of first-order trees, 
feature trees can be useful in replacing or extending those. Since tree automata play a major 
role, either directly or just by underlying some other formalism, the regular theory of feature 
trees developed here is a prerequisite for this investigation. 

A more speculative application might be conceived as part of the compiler optimizer of 
the programming language LIFE [AKP91b]. Namely, unary predicates over feature trees 
defined by Horn clauses without multiple occurrences of variables define recognizable sets 
of feature trees. Now, satisfiability of the conjunction of two such predicates corresponds to 
non-emptiness of the intersection of the defined sets. When used in deep guards, entailment 
of a predicate by others of this kind corresponds to the subset relation on the defined sets of 
feature trees. 

We are curious to extend the developed theory in the following ways. First, we would like to 
find a logical characterization of the class of recognizable feature trees, extending the results 
of Doner, Thatcher/Wright and Courcelle [Don70, TW67, Cou90]. It will be interesting to 
combine second-order logic and the counting constraints introduced here, in order to account 
for the flexibility in the depth as well as in the out-degree of the nodes of feature trees. 

Also, in order to account for circular data structures, like, e.g., circular lists, it is necessary to 
consider infinite (rational) feature trees. Thus, it would be useful to construct a regular theory 
of these. 

Finally, in [CD91] it is shown that the first-order theory of a tree automaton is decidable (in the 
case of a finite signature). More precisely, it is possible to solve first-order formulas built up 
from equalities between first-order terms and membership constraints of the form x 6 q, where 
q denotes a set defined by a tree automaton. Since we have established the corresponding 
automaton notion, we may hope to obtain the corresponding result for feature trees. For the 
special case of the set of all feature trees, this is the decidability of first-order feature logic. 
A proof for infinite feature trees can be found in [BS92]. Can the techniques of that proof be 
combined with the ones of [CD91]? 

We add the fact, suggested by one of the referees, that the first-order theory of multitrees is not 
decidable. This can be shown by employing a proof technique by Ralf Treinen [Trei92] . 
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Appendix 

A The Algebra of Multisets 

We will reduce J" -recognizability for languages of flat multitrees to a notion of recognizability 
of finite multisets of pairs. The idea is to identify a flat multitree with a finite multiset of pairs, 

{A,E) = {(root, A)} UE 

where root is considered like an extra feature. Roughly, the operation of adding edges 
corresponds to the union operation on multisets. 

In all generality, we introduce the algebra M. = M.(Ui, . . . ,U n ) of finite multisets over 
^-tuples with components in given sets U\ , . . . , U n , for some n > 1. (Later, we will instantiate 
U\ = T U {root} and Uj_ = S.) M. is n + 1-sorted, over the the sorts s\,.. . ,s n and FMS which 

denote, respectively, the domains D Sl = U\, . . . , D Sn = U n , and D FM s = lN^- te X ' ' ' X ^" 

(where f^^ ite denotes the set of finite multisets over M). 

The operations of M. are the (associative and commutative) union of multisets and the 
creation of a singleton multiset from n elements, one for each component, i.e., [u\ , . . . , u n ) M = 
{(«!,...,«„)}. Thus, they are mappings U M : D FMS x D FMS i-> D FMS , and ( ) M : 
UiX ...xM„h D F ms- 

Formally, M is an algebra over the {si, . . . , s n , FM5'}-sorted signature: 
Zu u ...,Un = U x W ... W U n W {(.,...,.),□} 

where the constants of sort 57 are just the elements of Ui, and the two function symbols have 
the profile: U : FMS X FMS i-> FMS, and ( ) : si x . . . X s„ ^ FMS. 

Lemma 3 The algebra M. is isomorphic to the quotient of the term algebra with the 
congruence generated by the associativity and commutativity laws for U, 



= T s Ul ,..., Un 



/AC' 



We define a subset of D F mt of multisets of rc-tuples to be recognizable if it is recognized by a 
finitary M -automaton. 

It is important to note that the notions of recognizability in M. = M.(lA\, . . . ,U n ) and 
M. (lA\ X ... xW„) can be different, namely if n > 2 and one of the Ui is infinite. 9 

Now, we consider the special case where U\ = T U {root} and U2 = S, i.e., 

M = M(FU {root},S). 



'Generally, the finiteness condition for M.{U\ x . . . x W„)-automata is weaker than the one for M. -automata. 
It may be strictly weaker since cartesian products of finite and co-finite sets need neither be finite nor co-finite. 
For example, suppose U to be an infinite set. The cartesian product U x {1} is neither finite nor co-finite as 
subset of U x {0, 1}. Thus, the language of the singleton subsets of U x {1} is not recognizable in the algebra 
M(U x {0, 1}), but it is with respect to M = M(U, {0, 1}). — In fact, it is this finitary-condition which makes 
the proofs that complicated and non-standard. 
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Thus, the domains of M. are = ^U{roof},D^ = S.andD^ = FMS(FU{wot} xS) . 
We define the injection: 

by/((A,£')) = {(root, A)} U £. Thus (writing the operator U M infix): 

(t,f,A)) = l(t) U M (f,A) M . 

Lemma 4 (Reduction Lemma) A language L of flat multitrees is recognizable in J iff the 
language /(l) of multisets of pairs is recognizable in M. 

Proof The difficult direction is from left to right. Given a finitary J"-automaton (A, h, <2finai), 
where D^ T = Q and = P, we construct a finitary .M-automaton (A*, h*, <2finai) such that, 
for all flat multitrees t: 

**(/(/)) = h(t). (4) 

This is sufficient to show the recognizability of /(l) , since = h~ l (A) n l(MTi), and 
l(M.T\) is a recognizable set in M.. 

We set D A * = Q, D A * = P U {p roo t}, and (where Func denotes the set of functions 
generated by the functions ( . ,p,q); i.e., the smallest set containing these and closed 
under composition): 

d fms = Func w Q w {9±}- 

The evaluation of A* is defined by (we write A * instead of and use the more intuitive 
infix notation): 

(p,q) A l = ^ A {-,p,q), 
(Pwot, q) A * = q , 

hi Li A * hi = h\ o hi, 
q u A * h = , 
hU A * q = %), 
q q = q ± . 

Every function in the interpretations taking q± as argument is again mapped to q±. Precisely: 

qi. U" 4 * h = qj_, 

h U A * qj_ = qj_, 

q± LT q = q ± , 

U A * q ± = ^ ± , 

(p,qi) A ^ = <7_L, 

(Proot,q±) A * = <?_L- 

Clearly, A* is an AC-automaton,/.e., the operation U A * is associative and commutative. The 
associativity is trivial for functions as arguments. The commutativity for functions follows 
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from the OIT-axiom, and the associativity for functions by: 



^ A (=> A {-,P\,q\),P,q)) 

^ A (^ A (-,p,q),Phqi)) 

^ A {.,p h qi) U A * ^ A (.,p,q). 



The proof for all possible cases is now easily established. 

The identity (4) is now easily verified. Finally, we note that the finitary-condition is preserved 
from A to A*. 

For the other direction, given a finitary M -automaton A*, we will construct a fmitary J- 
automaton A satisfying (4). This is sufficient, now, since MT\ is a recognizable set in J". In 
fact, we will construct an automaton in another algebra. 10 Next, we will introduce this algebra. 
We resume this proof after having proven Lemma 6. 

The algebra Ji oca i of flat multitrees is obtained from the algebra J by restricting the domain 
of the third argument from MT to S (... = MTq), and the domain of the first from MT to 
MT\, i.e., to to flat multitrees instead of arbitrary ones. 

That is, the algebra Ji oca i is three-sorted with sorts MT\ , F and S. The domains are given by 
DmTi — MT\,Dp = F,Ds = S. The operation is given by (where E is a finite multiset 
over pairs in T x S): 

{{A u E),f,A 2 ) = (A 1 ,£U{(/,A 2 )}) 

(which is equal to ( (Ai, E),f, Aj)). The signature of Ji oca l is tne disjoint union: 

Zlocal = S W T W S W {^}. 

Here, the symbols in S appear twice: they are supposed to be renamed apart. Firstly, they are 
constants of sort MT\ , and secondly, they are constants of sort S. The different functionality is 
made clear syntactically by writing A MTl and A$, with interpretations (AMTi) Jlocal — {A,9) G 
MT o C MT i and (A S ) J '^' = A eS. 

The features are constants of sort F and interpreted freely. The profile of the function symbol 

in Jiocai is =>: MT X x F X 5 -> MT X . 

The algebra Ji oca i satisfies the order independence theory (OIT); namely, for all flat multitrees 
t, features / and symbols A the following holds. 



The following lemma states that we can use the more concrete notion of tree automata. 
Lemma 5 Ji oca l is isomorphic to a quotient term algebra, 



The motivation for the construction of yet another algebra is, roughly, the fact that a symbolA £ S occurs as a 
root-labeling as well as a leave-labeling; these two roles are distinguished in ,7-automata, but not in M-automata. 



^J,oca, ^Jiocal ( t ,f U A X ) ,f 2 , A 2 ) ((=^*»< (t,f 2 , A 2 ) ,/l , Ai ) 



J 'ocal — T, 



s local /OIT 
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Again, we define recognizability in Ji oca l m terms of finitary automata. 

Lemma 6 A language of flat multitrees is recognizable in J iff it is recognizable in Ji oca l- 

Proof We will first modify a finitary J"-automaton A, where D 1 ^ = Q and D F A = P, in order 
to obtain a finitary j7/ oca /-automaton A 1 such that the two automata (with the same set of final 
states) will recognize the same languages of flat multitrees. We define the domains of A 1 by: 

Df = Q, 

Sjfe = 6, 

Df = P, 

and we define the evaluation of A 1 by (for all A G S,f G T , and for all q, q' G Q and p G P): 

{A MT y l = a a , 

{Aif = A A , 
= f A , 

{q,p,q') = ^ A {q,p,q')- 

Clearly the nnitary-condition and the order independence theory are preserved between A 1 
and A. 

For the other direction, given a finitary J/ OC£1 /-automaton A 2 (with final states <2g nal , of sort 
MTi), we will define a finitary Jfo C£1 /-automaton A 1 that recognizes the same language, but 
has the two properties: D-f Ti = Df , and, for all symbols A in S, {A M Ti) AI = (As)" 4 ' . 
Thanks to these, one can define a jT-automaton A that accepts the same flat multitrees as A 1 . 
Again, this is sufficient since the language M.T\ is recognizable with respect to J. 

We define the domains of A 1 by: 

D A l _ n A 2 v n A 2 

MT\ — U MT X x U S > 

Df = Df Tl xDf, 

n A l _ D A 2 
u F — u F , 

and, after having fixed an arbitrary element r^ G Df, we define the evaluation of A 1 by (for 
all A G S,f G T, and for all q, q G Df Ti ,p G Df and r, f G D^ 2 ): 

(AMrJ^ = ((A Mri )^ 2 ,(A^ 2 ), 
(A,)^ 1 = ((A M7 ^\(A^ 2 ), 

=^ ((^^.p, (§,?)) = {^ Al {q,p,f),r fix ). 
As final states of A 1 we choose: 

Gfinai = {{q,r)\qe eLi and r G Df }. 
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Again, the finiteness condition and the order independence theory are preserved. This 
concludes the proof of Lemma 6. □ 

End of Proof of Reduction Lemma 4 

Given a finitary M. -automaton A*, we construct a finitary j7/ OC£1 /-automaton A such that 
(l(t)) A * = t A for all flat multitrees t. The domains of A are: D A = D A * , D A = D A * 
and D A Ty = D A m S - 

The evaluation of A is defined by (where q, p and r are states of A of sorts MT\ , F and 5): 

fa)* = A A * , 
f A = f A* f 

(A MTl ) A = (root A \(A s ) A *) A \ 
^ A {q,P,r) = q U A * (p, r) A *. 
Since A* satisfies (AC), A satisfies (OIT). The fmitary-condition is preserved, as well. □ 



B Counting in Multisets 

Going back to the general framework where M. = M.(jA\ , U n ), we will now characterize 
recognizability in M, i.e., of languages of finite multisets, by appropriate counting constraints. 

We define M. -counting constraints C (written C(x) to indicate that x is the only free variable, 
which is, logically, a multiset variable) to expressions of the following form: 

C(x) ::= card{(ui, ...,«„) G x \ w,- G t// for all i} E N 

| c(x) n c(x) 

| C(x)uC(x). 

Here, N is a recognizable set of natural numbers with respect to the monoid (N, +, 0), and the 
sets U{ C U{ are finite or co-finite. The counting constraint 

C(x) = card { (u\ , . . . , u n ) G x \ u; G t/,- for all i } eiV holds for the multiset x if the number 
of tuples (mi, ... , u n ) in x such that m,- G £/,• for all i = 1, . . . , n is an element of N. The 
cardinality operater card applies on a multiset of tuples, i.e., counts double occurrences. 

The language defined by an M. -counting constraint C(x) is the set of all finite multisets x that 
satisfy C(x). It is denoted by Lm{c). 

Theorem 5 The family of languages defined by M-counting constraints is exactly the family 
of languages of multisets recognizable in M.. 

Proof. Given an M. -counting constraint of the form: C = card{(ui, ...,«„) G x \ w ; - G 
Ui for all i } G N, we will show the recognizability of Lm{c). 

We can define a homomorphism h : M.(lA\ . . . ,U n ) — > A^({1}, . . . , {1}) by setting h(u;) = 
{1} for Uj G Ui, and h(uj) = 0 otherwise. 
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Furthermore, the homomorphism/ : ^^ nite — > N, given by /({(wi, . . . , u n )} = 1 

if (mi, . . . , u n ) = (l, .. . , l), and . . . = 0, otherwise, identifies a multiset consisting of k tuples 
(l,...,l)withJfceN. 

Thus, for all finite multisets of ^-tuples x G Dpmt, 

j(h(x)) = card {(ui, ... ,u n ) G x | G Uj for all i }. 

Hence, L^((c) = A - 1 (/~ 1 (n) ) . The fmitary-condition is invariant under inverse images of 
homomorphisms. Thus, L^(c) is recognizable in M. 

For the reverse inclusion, suppose that L is recognized by a finitary .M-automaton (A, h, <2finai) 
with, say, the set Dfms = {qi, ■ ■ ■ , Qn} of states of sort FMS. 

The evaluation of the multiset t by A leads to the state (written in a notation which is justified 
by the fact that A satisfies (AC), even if U A is taken over the empty multiset): 

(m,...,u n )Et 

We define the natural numbers: a t {() = card {(«i, . . . , u n ) G t\ {uf, ... , uf) A = qi } and 
obtain (again thanks to (AC) being satisfied): 

n a t (i) 
1=1 7=1 

We define a mapping v t : { 1 , . . . , n} — > { 1 , . . . , n} such that q Vt ^ = |_|J=i^ Qi • If ? G Lm {A), 
then: 

n 

|_|^ VMi) e 2final, (5) 
(=1 

Generally, for a mapping // : { 1 ,...,«} — > { 1 ,...,«}, we define, for i = 1 , . . . , n, the set of 
natural numbers: 

m 

K = {meH\[_] A 4i = ^(0>- 
;'=i 

We note that a f (i) G for i = 1, . . . , n. That is, t is an element of the language defined by 
the M -counting constraint: 

Ka x {i)eK t - 
i=i 

Vice versa, for each mapping fi satisfying the property (5), the language of the M -counting 
constraint: 

n 

(•=i 
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is contained in L. We get L = L(r) where R is the M. -counting constraint: 

«= V A e < • 

M !=1 

wif/i (5) 

Since the number of mappings /x with (5) is finite, it only remains to show that the constraints 
used in R are of the denned form. The constituents <z ; (x) are admissible by the fmitary- 
condition of A. Finally, we have to prove that the sets A 7 ^ are recognizable with respect to 

(N, +, 0). We will construct appropriate automata A^ from A. We set D A '» = Q, O" 4 '^ = $ A , 
\ A \ = q i and interpret the addition by U A . As final states we take the singleton {q^}. Then, 
A^ recognizes N'^. □ 

Proof of Theorem 2. 

For each language L of flat multitrees defined by a counting constraint C we will find an 
M. -counting constraint C" that defines and vice versa. 

Given a counting constraint for flat multitrees of the form: 

C{x) = card {<p G F \ By. (x<py A Ty} G N, 

we set: 

C'(x) = card{(<p,y) E x \ y E F /\ y E T} EN 
fl card {(root, y) G x \ y G J 7 } = 1 . 
The case C = Tx is obvious, as well as conjunction and disjunction. 

For the other direction, given an .M-counting constraint C' for finite multisets, we will 
give a constraint C such that Lmt^Cx) = F~ 1 (Lm{C)), or, equivalently, Lmt^C) = 
Lm{C) H /(A^Tj). We note that the languages of the form are the multisets containing 
exactly one pair with first component root. Given the M. -counting constraint: 

C' = card{(cp,y) Ex\ipEFAyE T} G N, 

we have to distinguish the two cases root £ F and root G F. In the first case we set: 

C = card{<p G F \ 3y. (x<py A Ty} EN. 

In the second case, we note that the set: N - 1 = {n - 1 | n G N and n > 1} is recognizable 
with respect to (A/ - , +, 0), and set: 

C = card {ip G F - {root} \ By. (xipy A Ty)} £ N - I 

n Tx. 

In either case C has the required property. 

This concludes the proof of Theorem 2, since the reduction lemma (Lemma 4, page 13) and 
the above theorem (Theorem 5) close the cycle from counting-definable languages L of flat 
feature trees to those recognizable in J by feature automata. Namely, according to the above 
correspondence between counting- and .M-counting constraints, via M. -counting-definable 
languages which, according to Theorem 5, are exactly the ones recognizable in M, back 
to L according to Lemma 4. □ 
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